Unbiased Assesment of Learning Algorithms
نویسندگان
چکیده
In order to rank the performance of machine learning algorithms, many researchers conduct experiments on benchmark data sets. Since most learning algorithms have domain-specific parameters, it is a popular custom to adapt these parameters to obtain a minimal error rate on the test set. The same rate is then used to rank the algorithm, which causes an opt imistic bias. We quantify this bias, showing, in particular, that an algorithm with more parameters wi l l probably be ranked higher than an equally good algorithm with fewer parameters. We demonstrate this result, showing the number of parameters and trials required in order to pretend to outperform C4.5 or FOIL, respectively, for various benchmark problems. We then describe out how unbiased ranking experiments should be conducted.
منابع مشابه
Comparative Analysis of Machine Learning Algorithms with Optimization Purposes
The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches. Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data. In this paper, a methodology has been employed to opt...
متن کاملA U -statistic estimator for the variance of resampling-based error estimators
We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Therefore, several standard theorems on properties of U-statistics apply. In particular, it has minimal variance among all unbiased estimators and is asymptotically normal...
متن کاملSequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...
متن کاملSequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...
متن کاملMachine learning algorithms in air quality modeling
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997